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Abstract 

To quantify the operational risk capital charge under the current regulatory framework 
for banking supervision, referred to as Basel II, many banks adopt the Loss Distribution 
Approach. There are many modeling issues that should be resolved to use the approach 
in practice. In this paper we review the quantitative methods suggested in literature 
for implementation of the approach. In particular, the use of the Bayesian inference 
method that allows to take expert judgement and parameter uncertainty into account, 
modeling dependence and inclusion of insurance are discussed. 

Keywords: operational risk; loss distribution approach; Bayesian inference; depen- 
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1 Operational risk under Basel II 



Under the current regulatory framework for the banking industry [1], referred to as Basel II, 
the banks are required to hold adequate capital against operational risk (OR) losses. OR 
is a new category of risk, in addition to market and credit risks, attracting capital charge 
and defined by Basel II [1, p. 144] as: "/■ ■ ■/ the risk of loss resulting from inadequate or 
failed internal processes, people and systems or from external events. This definition includes 
legal risk, but excludes strategic and reputational risk." Similar regulatory requirements for 
the insurance industry are referred to as Solvency 2. OR is significant in many financial 
institutions. Examples of extremely large OR losses are: Barings Bank (loss GBP 1.3 billion 
in 1995), Sumitomo Corporation (loss USD 2.6 billion in 1996), Enron (USD 2.2 billion in 
2001), and recent loss in Societe Generale (Euro 4.9 billion in 2008). In Basel II, three 
approaches can be used to quantify the OR annual capital charge C, see [1, pp. 144-148]: 

• The Basic Indicator Approach: C = Sj=i a = 0.15, where GI{j), j = 1, ..,n 
are the annual positive gross incomes over the previous three years. 

• The Standardised Approach: C* = | X]j=i ^^^ELi AG'/j(j), 0], where i = 1, . . . , 8 
are the factors for eight business lines (BL) listed in Tableland GIi{j), j = 1, 2, 3 are 
the annual gross incomes of the i-th BL in the previous three years. 

• The Advanced Measurement Approaches (AMA): a bank can calculate the capital 
charge using internally developed model subject to regulatory approval. 

A bank intending to use the AMA should demonstrate accuracy of the internal models within 
the Basel II risk cells (eight business lines times seven risk types, see Table [1]) relevant to 
the bank and satisfy some criteria, see [1, pp. 148-156], including: 

• The use of the internal data, relevant external data, scenario analysis and factors 
reflecting the business environment and internal control systems; 

• The risk measure used for capital charge should correspond to the 99.9% confidence 
level for a one-year holding period; 

• Diversification benefits are allowed if dependence modeling is approved by a regulator; 

• Capital reduction due to insurance is capped by 20%. 

A popular method under the AMA is the loss distribution approach (LDA). Under the 
LDA, banks quantify distributions for frequency and severity of OR losses for each risk cell 
(business line/event type) over a one-year time horizon. The banks can use their own risk 
cell structure but must be able to map the losses to the Basel II risk cells. There are various 
quantitative aspects of the LDA modeling discussed in several books [2-7] and various papers, 
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e.g. [8-10] to mention a few. The commonly used LDA model for calculating the total annual 
loss in a bank (occurring in the years t = 1, 2, . . .) can be formulated as 

z^.)it) = J2zM z,{t) = Y,xf{f). (1) 

j=i i=i 

Here, the annual loss Zj{t) in risk cell j is modeled as a compound process over one year with 
the frequency (annual number of events) Nj{t) implied by a counting process (e.g. Poisson 
process) and random severities xj*''(t), i = 1, . . . , Nj{t). Typically, the frequencies and 
severities are assumed independent. Estimation of the annual loss distribution by modeling 
frequency and severity of losses is a well-known actuarial technique, see e.g. Klugman et 
al. [11]. It is also used to model solvency requirements for the insurance industry, see e.g. 
Sandstrom [12] and Wiithrich and Merz [13]. Under the model ([T]), the capital is defined 
as the 0.999 Value at Risk (VaR) which is the quantile of the distribution for the next year 
annual loss 2'(,)(T + 1): 

\/ai?,(Z(.)(T + 1)) = F^-;,(^+i)(g) = inf{z : Pr[Z(.)(T + 1) > ^] < 1 - g} (2) 

at the level q = 0.999. Here, index T+1 refers to the next year and notation Fy ^(g) denotes 
the inverse distribution of a random variable (rv) Y. The capital can be calculated as the 
difference between the 0.999 VaR and expected loss if the bank can demonstrate that the 
expected loss is adequately captured through other provisions. If correlation assumptions can 
not be validated between some groups of risks (e.g. between business lines) then the capital 
should be calculated as the sum of the 0.999 VaRs over these groups. This is equivalent to 
the assumption of perfect positive dependence between annual losses of these groups. 

In this paper, we review some methods proposed in the literature for the LDA model 
([T]). In particular, we consider the Bayesian inference approach that allows to account for 
expert judgment and parameter uncertainty which are important issues in operational risk 
management. 

The paper is organised as follows. Section [2] describes the requirements for the data 
that should be collected and used for Basel II AMA. Section [3] and Section H] are focused 
on modeling truncated data and the tail of severity distribution respectively. Calculation of 
compound distributions is considered in Section [51 In Section [6], we review the estimation of 
the frequency and severity distributions using frequentist and Bayesian inference approaches. 
Combining different data sources (internal data, expert judgement and external data) is 
considered in Section [3 Modeling dependence and insurance are discussed in Section |8] and 
Section [9] respectively. Finally, Section [10] presents the estimation of the capital charge via 
full predictive distribution accounting for parameter uncertainty. 
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2 Data 



Basel II specifies requirement for the data that should be collected and used for AMA. In 
brief, a bank should have: internal data, external data and expert opinion data. In addition, 
internal control indicators and factors affecting the businesses should be used. Development 
and maintenance of OR databases is a difficult and challenging task. Some of the main 
features of the required data are summarized as follows. 

Internal data. The internal data should be collected over a minimum five year period to 
be used for capital charge calculations (when the bank starts the AMA, a three-year period 
is acceptable). Due to a short observation period, typically, the internal data for many risk 
cells contain few (or none) high impact low frequency losses. A bank must be able to map 
its historical internal loss data into the relevant Basel II risk cells in Table [H The data must 
capture all material activities and exposures from all appropriate sub-systems and geographic 
locations. A bank can have an appropriate reporting threshold for internal data collection, 
typically of the order of Euro 10,000. Aside from information on gross loss amounts, a bank 
should collect information about the date of the event, any recoveries of gross loss amounts, 
as well as some descriptive information about the drivers of the loss event. 

External data. A bank's OR measurement system must use relevant external data. These 
data should include data on actual loss amounts, information on the scale of business opera- 
tions where the event occurred, and information on the causes and circumstances of the loss 
events. Industry data are available through external databases from vendors (e.g. Algo Op- 
Data provides publicly reported OR losses above USD 1 million) and consortia of banks (e.g. 
ORX provides OR losses above Euro 20,000 reported by ORX members). The external data 
are difficult to use directly due to different volumes and other factors. Moreover, the data 
have a survival bias as typically the data of all collapsed companies are not available. Several 
Loss Data Collection Exercises (LDCE) for historical OR losses over many institutions were 
conducted and their analyses reported in the literature. In this respect, two papers are of 
high importance: Moscadelli [14] analysing 2002 LDCE and Dutta and Perry [15] analysing 
2004 LDCE where the data were mainly above Euro 10,000 and USD 10,000 respectively. 

Scenario Analysis/expert opinion. A bank must use scenario analysis in conjunction 
with external data to evaluate its exposure to high-severity events. Scenario analysis is 
a process undertaken by experienced business managers and risk management experts to 
identify risks, analyse past internal/external events, consider current and planned controls 
in the banks; etc. It may involve: workshops to identify weaknesses, strengths and other 
factors; opinions on the impact and likelihood of losses; opinions on sample characteristics 
or distribution parameters of the potential losses. As a result some rough quantitative 
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assessment of risk frequency and severity distributions can be obtained. Scenario analysis 
is very subjective and should be combined with the actual loss data. In addition, it should 
be used for stress testing, e.g. to assess the impact of potential losses arising from multiple 
simultaneous loss events. 



Business environment and internal control factors. A bank's methodology must 
capture key business environment and internal control factors affecting OR. These factors 
should help to make forward-looking estimation, account for the quality of the controls and 
operating environments, and align capital assessments with risk management objectives. 



3 A note on modeling truncated data 

As mentioned above, typically internal data are collected above some low level of the order 
of Euro 10,000. Generally speaking, omitting data increases uncertainty in modeling but 
having a reporting threshold helps to avoid difficulties with collection of too many small 
losses. Often, the data below a reported level are simply ignored in the analysis, arguing that 
the capital is mainly determined by the low frequency heavy tailed severity risks. However, 
the impact of data truncation for other risks can be significant. Even if the impact is small 
often it should be estimated to justify the reporting level. Recent studies of this problem 
include Frachot et al. [9], Bee [16], Chernobai et al. [17], Mignola and Ugoccioni [18], Luo et 
al. [19], and Baud et al. [20]. A consistent procedure to adjust for missing data is to fit the 
data above the threshold using the correct conditional density. To demonstrate, consider 
one risk cell only, where the loss events follow a Poisson process, so that the annual counts 
N{t), t — 1, . . . ,T are independent and Poisson distributed, Poisson{X), with the probability 
function 

\ k 

p{k\X) = Pr[N{t) = A;] = — exp(-A), A > 0, A; = 0, 1, . . . . (3) 

Assume that the severities X'^^\t) are all independent and identically distributed (iid) from 
the density f{x\(3) whose distribution is denoted F{x\f3), where /3 is a vector of distribution 
parameters. Also, assume that the counts and severities are independent. Then the loss 
events above the level L have iid counts N(t) from Poisson{XL), Xl = A(l — F{L\f3)) and 
iid severities X^'^\t) from the conditional density 

/l W) = L<x<oc. (4) 

The joint density (likelihood) of the data Y over a period of T years (all counts N{t) 
and severities X«(t), t = l,.. .,N{t), t = 1, . . . , T), at N{t) = n{t) and X«(t) = x^'^t), is 
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T n{t) 

^(y|/3,A) = nP(^WIMl-^(^l/3)))n/45^^^ni)l/3). (5) 

t=l 1=1 

Note that here, the conditional density /j;^(x|/3), rather than f(x\j3), is used. The parameters 
(/3, A) can be estimated, for example, by maximizing the likelihood ([5]) and their uncertainties 
can be estimated using the second derivatives of the log-likelihood; see Section 16.11 Then 
estimated frequency Poisson{X) and severity f{x\f3) densities are used for the annual loss 
calculations. 

In the case of constant threshold, the maximum likelihood estimators (MLEs) for pa- 
rameters f3 and A can be calculated marginally, i.e. /3 is calculated by maximizing the 
likelihood of the severities; Aj;, is calculated using the average of the observed counts; and 
finally A = Al/(1 — F(L|/3)). However, calculation of their uncertainties will require the use 
of the full joint likelihood ([5]). If the observed losses are scaled before fitting or the reporting 
threshold has changed over time then one should consider a model with the threshold vary- 
ing in time studied in [21]. In this case the joint estimation of the frequency and severity 
parameters using full likelihood of the data is required even for parameter point estimators. 

Of course, the assumption in the above approach is that missing losses and reported 
losses are realizations from the same distribution. Thus the method should be used with 
extreme caution if a large proportion of data is missing. 

Ignoring missing data. Ignoring missing data will have an impact on risk estimates. For 
example, using data reported above the threshold, one can fit Poisson^XLj frequency and 
fit the severity using: 

• ^^naive moder - f{x\f3); 

• ''shifted moder - f{x-L\f3); 

• ''truncated modeF - /l(x|/3). 

Calculation of the annual loss quantile using incorrect frequency and severity distributions 
will induce a bias. Figure [1] and Figure [2] show the relative bias in the 0.999 annual loss 
quantile (relative difference between the 0.999 quantiles under the false and true models) vs a 
fraction of truncated points for the cases of light and heavy tail severities respectively. In this 
example, the severity is from Lognormal distribution, LN{ix,a), i.e. log-severity lnX^*^(t) is 
from the Normal distribution, Normal{fi,a), with mean fi and standard deviation a. The 
parameter values are chosen the same as some cases considered in [19]. In particular: Figure 
[U is the case of Xl = 10 and cr = 1; and Figure [2] is the case of A^ = 10 and cr = 2. 
The latter corresponds to the heavier tail severity. Here, the calculated bias is due to the 
model error only, i.e. corresponds to the case of a very large data sample. Also note, that 
the actual value of the scale parameter n is not relevant because only relative quantities 
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are calculated. ^^Naive model" and ^^shifted model" are easy to fit but induced bias can be 
very large. Typically: "naive model" leads to a significant underestimation of the capital, 
even for a heavy tail severity; "shifted model" is better than "naive model" but worse then 
"truncated model" ; the bias from "truncated model" is less for heavier tail severities. 



4 Modeling severity tail 

Due to simple fitting procedure, one of the popular distributions to model severity is Log- 
normal. It is a heavy tail distribution, i.e. belongs to the class of so-called sub-exponential 
distributions where the tail decays slower than any exponential tail. Often, it provides a 
reasonable overall statistical fit as reported in the literature, see e.g. [15]. Also, it was sug- 
gested for OR at the beginning of Basel II development, sec [22, p. 34]. However, due to the 
high quantile level requirement for OR capital charge, accurate modeling of extremely high 
losses (the tail of severity distribution) is critical and other heavy tail distributions are often 
considered to be more appropriate. Two studies of OR data collected over many institu- 
tions arc of central importance here: Moscadelli [14] analysing 2002 LDCE (where Extreme 
Value Theory (EVT) is used for analysis in addition to some standard two parameter distri- 
butions) and Dutta and Perry [15] analysing 2004 LDCE. The latter paper considered the 
four-parameter g-and-h and GB2 distributions as well as EVT and several two parameter 
distributions. 



EVT-threshold exceedances. There are two types of EVT models: traditional block 
maxima (modeling the largest observation) and threshold exceedances. The latter is often 
used to model the tail of OR severity distribution and is briefly described below; for more 
details see McNeil et al. [6] and Embrechts et al. [23]. Consider a rv X, whose distribution 
is Pr[X < x] — F{x). Given a threshold u, the exceedance of X over u is distributed from 

FM = P4X-u<y\X>u] = ^^^ + %;y . (6) 

1 — r [u) 

Under quite general conditions, as the threshold u increases, the excess distribution Fu{.) 
converges to a generalized Pareto distribution (GPD) 

\ 1 - oM-y/Py, « = 0. ^' 

That is we can find a function /3(ti) such that 

lim sup \Fu{y) - G^^p(^u){y) \ = 0, 

u->a o<y<a-u 

where a < oo is the right endpoint of F{x), ^ is the shape parameter and /3 > is the scale 
parameter. Also, y > when ^ > and < y < -P/C when ^ < 0. The GPD case ^ = 
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corresponds to an exponential distribution. If ^ > 0, the GPD is heavy tailed and some 
moments do not exist. In particular, if ,^ > 1/m then the m-th and higher moments do not 
exist. The analysis of OR data in [14] reported even the cases of > 1 for some business 
lines, i.e. infinite mean distributions; also see discussions in Neslehova et al. [24]. It seems 
that the case of ^ < is not relevant to modeling OR as all reported results indicate non- 
negative shape parameter. Though, one could think of a risk control mechanism restricting 
the losses by an upper level and then the case of ^ < might be relevant. 

In the context of OR, given iid losses X^, A; = 1, 2, . . . , if one can chose a threshold u 
and model the losses above the threshold using GPD ([7]) and the losses below using empirical 
distribution, i.e. 

Fix) ~ / ^^'^^^ " ^^^^ " ^""^^^^ ^ ^ ~ (8) 

^""^^X F^{x) = ^ZtiHX^'^<xy, x<u. 

Here, /(.) is an indicator function. There are various ways to fit the GPD parameters includ- 
ing the Bayesian inference and maximum likelihood methods; see Section |6] and Embrechts 
et al. [23, Section 6.5]. The approach ([8]) is a so-called splicing method when the density is 
modeled as 



/(x) = + ^2/2(2;), Wi+W2 = l, (9) 

where fi{x) and f2{x) are proper density functions defined on a; < L and x > L respectively. 
In dH]), fi{x) is modeled by the empirical distribution but one may choose a parametric 
distribution instead. Splicing can be viewed as a mixture of distributions defined on non- 
overlapping regions while a standard mixture distribution is a combining of distributions 
defined on the same range. The choice of the threshold u is critical, for details of the 
methods to choose a threshold we refer to [23, Section 6.5]. 

g-and-h, GB2 and GCD distributions. A rv X is said to have g-and-h distribution if 

X = a + fc ^^P(^^) - ^ exp(/irV2), (10) 

where F is a rv from the standard Normal distribution and (a, b, g, h) are the parameters. A 
comparison of the g-and-h with EVT was studied in Degen et al. [25]. It was demonstrated 
that for the g-and-h distribution, convergence of the excess distribution to the GPD is 
extremely slow. Therefore, quantile estimation using EVT may lead to inaccurate results if 
data are well modeled by the g-and-h distribution. 

GB2 (the generalized Beta distribution of the second kind) is another four-parameter 
distribution that nests many important one- and two-parameter distributions. Its density is 
defined as 
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•^^^^ " ¥PB{p,q)[l + {x/hY]P+'i' ^ ^ °' ^^^^ 
where -B(p, g) is the Beta function and {a,b,p,q) are parameters. Both g-and-h and GB2 four 
parameter distributions were used in [15] as the ahernative to EVT. 

A convenient distribution recently suggested for OR in [26] is Generahzed Champernowne 
distribution (GCD) with the density defined as 

' ((a; + c)- + (M + c)"-2c«)2' ^'^> 

and parameters q;>0,M>0,c>0. It behaves as Lognormal in a middle and as Pareto in 
the tail. 



5 Calculating compound distribution 

If the severity and frequency distributions and their parameters are known then, in general, 
the distribution H[.) of the annual loss 

N 

Z = ^X» (13) 

1=1 

should be calculated numerically. Here, we assume that severities X*^*) are iid from the 
distribution F{x) and the frequency iV, with pn = Pr[iV = n], is independent from the 
severities. Most popular numerical methods are Monte Carlo, Panjer recursion and FFT 
methods described below. Also, there are several approximations discussed below too. 

5.1 Monte Carlo method. 

The easiest to implement is Monte Carlo method with the following logical steps: 

1. For A; = 1,...,X 

(a) Simulate the annual number of events N from the frequency distribution. 

(b) Simulate independent severities X'^^\ . . . , X^^'> from the severity distribution. 

(c) Calculate = E^Ii^^'^- 

2. Next k 

Obtained Z^^\ . . . , Z^^^ are samples from a compound distribution H{.). The 0.999 quantile 
and other distribution characteristics can be estimated using the simulated samples in the 
usual way. Denote the samples Z^^^ , . . . , Z'^^^ sorted into ascending order as Z'^^^ < ■ ■ ■ < 
Z'^^\ then the quantile H~^{q) can be estimated by Z^L-^^^+^J^ Here, [_.J denotes rounding 
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downward. The numerical error (due to the finite number of simulations K) in the quantile 
estimator can be assessed by forming a conservative confidence interval [Z'^'^\ Z^'^^] to contain 
the true value with probability 7. This can be done by utilizing the fact that the number 
of samples not exceeding the quantile H^^{q) has a Binomial distribution with parameters 
q and K (i.e. with mean = Kq and var = Kq{l — g)). Approximating the Binomial by the 
Normal distribution leads to a simple formula: 

r=L/J, / = irg-F/((l+7)/2)v/^^(r^ 
s=\u\, u = Kq + F^\{1 + ^)/2)^Kq{l - q), 

where [.] denotes rounding upwards and is the standard Normal distribution. The 
above formula works very well for Kq{l — q) ^ 50. 

A large number of simulations, typically K > 10^, should be used to get a good numerical 
accuracy for the 0.999 quantile. However, a priori, the number of simulations required to 
achieve a specific accuracy is not known. One of the approaches is to continue simulations 
until a numerical error, calculated using (1141) . achieves the desired level. 



5.2 FFT and Panjer recursion. 

Although Monte Carlo is straightforward and robust, it is slow to get accurate results. High 
precision results are especially important for sensitivity studies, where the first or even the 
second order derivatives are involved. Fast Fourier Transform (FFT) and Panjer recursion 
are the other two popular alternatives to calculate the distribution of compound loss (I13p . 
Both have a long history, but their applications to computing very high quantiles of the 
compound distributions in the case of high frequencies or heavy tail severities are relatively 
recent developments in the area of quantitative risk. 

Panjer recursion. The Panjer recursion is based on calculating the compound distribution 
via convolutions. Using a well known fact that the distribution of the sum of two independent 
continuous random variables can be calculated as convolution, the compound distribution of 
the annual loss Z can be calculated as 

00 00 
H{z) = Pt[Z < z\N = k] Pt[N = k] = 'YpkF^''\z), (15) 

k=0 k=0 

where F^''\z) = Pr[Xi + ■ ■ ■ + < z] is the kth convolution of F calculated recursively as 

F^^\z)= [ F^''^^\z-x)f{x)dx 
Jo 

with F^^\z) = 1, 2; > and F^'^\z) = 0, z < 0. Note that integration limits are and z 
because severities considered are nonnegative. Thought the formula is analytic, its direct 
calculation is difficult as the convolution powers are not available in closed form in general. 
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Panjer recursion is a very efficient method to calculate (|T5|) in the case of Poisson, negative 
binomial and binomial frequency distributions and discrete severities. More precisely, the 
frequency distribution should belong to a Panjer class (a, 6, 1), i.e. to satisfy 

Pn = {a + b/n)pn-i, for n>2. (16) 

The continuous severity distribution F{x) can be discretized on {0, A, 2A, . . .}, by choosing 
a unit A > and defining discrete density = Pr[X = nA] as e.g. 

/o = F(A/2), /„ = F(nA + A/2)-F(nA-A/2), n=l,2,.... (17) 
Then the discrete compound density /i„ = Pt[Z = nA] can be calculated recursively as 

{pi - (a + b)po)U + J2]=i (« + bj/n) fjK-j 

K = z -7 , n>l (18) 

J- — a/o 

oo 

with ho = {foYvk- The corresponding discrete distribution converges to the true contin- 

fc=0 

uous distribution H{z) weakly as A ^ and the number of required operations is of the 
order of O(n^) (in comparison with 0{7r') of explicit convolution). Detailed description of 
this method can be found in [5, Section 6.6] and generalized versions of Panjer recursion are 
discussed in [27], [28], [29]. 

FFT method. The characteristic function (CP) of the compound loss Z can be calculated 
as 



oo 



xit) = ^[v{t)TPk = ^{vit)\ (19) 

fc=0 

where ip{t) is the CP of the severity density /(x): ip{t) = /(x) exp{itx)dx and ipis) is the 
probability generating function of the frequency distribution: ipi^) = 'YlV=Q^'^Pk- For exam- 
ple, in the case of distributed from Poisson{X), x(t) = exp{Xip(t) — X). Given CP, the den- 
sity of Z can be calculated via the inverse Pourier transform as h{z) = ^ x(t) exp{—itz)dt. 
Por severity discretized as (/o, /i, . . . , /a/_i), e.g. using f[T7|) . the continuous Pourier trans- 
form is reduced to the discrete Pourier transformation (DPT) 

A/-1 /2 m \ 

</^fc=$^/mexp(^A;j, fc = 0,l,...,M-l. (20) 

m=0 ^ ^ 

Then the compound loss discrete CP Xk = V'(v'fc)) = 0, . . . , M — 1 is calculated and the 
discrete density hk is recovered from Xk by the inverse transformation 



1 f 2nk \ 

^«=mZ^^'^^^p(-1^^)' ^ = 0,1,...,M-1. (21) 

k=0 ^ ^ 
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FFT is a method that allows to calculate the above DFTs (|20ll2TD efficiently using 0(M logs M) 
operations, when M = T' with a positive integer r; see e.g. [30, Chapter 12] for details and 
code. A commonly recognised pitfall of FFT in evaluating compound distribution, which was 
recently studied in great detail in [31,32], is the so-called aliasing error. To explain, assume 
that there is no truncation error in severity discretisation, then FFT procedure calculates the 
compound distribution on m = 0, . . . , M — 1, i.e. the mass of compound distribution beyond 
M — 1 is "wrapped" into the range m = 0, . . . , M — 1. This error is larger for heavy tailed 
severities. Choosing M much larger than the required quantile to reduce this error is very 
inefficient and the recommended procedure is to use tilting^ the transformation increasing the 
severity tail decay that commutes with convolution. The tilting transforms the severity as 
jj = exp{—j9)fj, j = 0, . . . , M — 1. Then FFT calculation of the compound distribution 
(12011211) is performed using transformed severity and the final result hj is adjusted to get 
the compound distribution hj = hj exp{6j). It was reported that the choice 6 ~ 20/M for 
standard double precision (8 bytes) calculations works well. 

The Panjer recursion has often been compared with FFT, and it is accepted that the former 
is slower if the grid size is large, see e.g. [31]. Both methods have discretization error. Note 
that, Panjer recursion has no truncation error presented in FFT. Also note that, FFT can 
be used in general for any frequency and severity distributions while Panjer recursion is 
restricted to non-negative severities and a special class of frequency distributions. Typically, 
both methods are faster than Monte Carlo by a factor of few orders. Other methods to 
calculate the compound distribution include direct integration of the CF [33] and a hybrid 
method combining Panjer recursion, importance sampling and trans-dimensional Markov 
chain Monte Carlo considered in [34]. 

5.3 Closed-form approximation 

The moments of compound distributions can be expressed via the moments of frequency 
and severity distributions. This can be utilized to approximate the compound loss using e.g. 
Normal or translated Gamma distributions by matching two or three moments respectively, 
see e.g. [6, Section 10.2.3]. Of course for heavy tailed distributions and high quantiles these 
approximations do not work well (note that low order moments may not even exist for some 
ORs). If the severities X^^^ are iid from the sub-exponential (heavy tail) distribution F{.), 
then the tail of the compound distribution H{.) is related to the severity tail as 

1-H{z) ^ E[N]{1-F{z)), z^oo, (22) 

see e.g. [23, Theorem 1.3.9]. Here, "~" means that the ratio of the left- and right-hand sides 
converge to 1. The validity of this asymptotic result was demonstrated for the cases when 
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is distributed from Poisson, binomial or negative binomial. This approximation can be 
used to calculate the quantiles of the annual loss as 

yai?,(Z) ~ - , q^l. (23) 

For application in the OR context, see Bocker and Kliippelberg [35]. Under the assumption 
that the severity has finite mean, Bocker and Sprittulla [36] derived a correction reducing 
the approximation error of fl23|) . 

6 Model fitting 

Estimation of the frequency and severity distributions is a challenging task, especially for 
low frequency high impact losses, due to very limited data for some risks. The main tasks 
involved into the fitting are: finding the best point estimates for the distribution parameters, 
quantification of the parameter uncertainties, assessing the model quality (model error). In 
general, these tasks can be accomplished by undertaking either the so-called frequentist or 
Bayesian approaches briefly discussed below. 

6.1 Frequentist approach 

Fitting distribution parameters using data via the frequentist approach is a classical problem 
described in many textbooks. For the purposes of this review it is worth to mention sev- 
eral aspects and methods. Firstly, under the frequentist approach one says that the model 
parameters are fixed while their estimators have associated uncertainties that typically con- 
verge to zero when a sample size increases. Several popular methods to fit parameters of the 
assumed distribution are: 

• method of moments - matching the observed moments; 

• matching certain quantiles of empirical distribution; 

• maximum likelihood - find parameter values that maximize the joint likelihood of data; 

• estimating parameters by minimizing a certain distance between empirical and theo- 
retical distributions, e.g. Anderson-Darling or other statistics, see Ergashev [37]. 

The most popular approach is the maximum likelihood method. Here, given the model pa- 
rameters 6 = (6'i, 62, . . .), assume that the joint density (likelihood) of data Y = (Fi, . . . , F„) 
is known in functional form ^{y\0). Then the maximum likelihood estimators (MLEs) 0^'^^^ 
are the values of the parameters maximizing l{y\0). Often it is assumed that Yi, . . . ,Yn 

n 

are iid from /(.|^); then the likelihood is i{y\0) = Y[ fiViW)- The uncertainty of the MLEs 

1=1 

can be estimated using the asymptotic result that: under suitable regularity conditions, as 



13 



the sample size increases, 0^'^^^ converges to and is Normally distributed with the mean 
and covariance matrix n~^I{0)^^ , where 

liO)km = ~E[d'\iii(Y\0)/d9kd9m] (24) 

is the expected Fisher information matrix. For precise details on regularity conditions and 
proofs see e.g Lehmann [38, Theorem 6.2.1 and 6.2.3]. The required regularity conditions 
are mild but often difficult to prove. Also, whether a sample size is large enough to use 
this asymptotic result is another difficult question to answer in practice. Often, ( 12^ is 
approximated by the observed information matrix 

- -dHni{y\0)/d9kd9rr, = -- J^" In f{y,\0)/d9kd9^ (25) 

for a given realization of data y. This should converge to the matrix ( 12^ by the law of 
large numbers. Note that the mean and covariances depend on the unknown parameters 
and are usually estimated by replacing with 0^^^^ for a given realization of data. This 
asymptotic approximation may not be accurate enough for small samples. 

Another common way to estimate the parameter uncertainties is Bootstrap method. It 
is based on generating many data samples of the same size from the empirical distribution 
of the original sample and calculating the parameter estimates for each sample to get the 
distribution of the estimates. For a good introduction to the method we refer the reader to 
Efron and Tibshirani [39]. 

Usually maximization of the likelihood (or minimization of some distances in other meth- 
ods) should be done numerically. Popular numerical optimization algorithms include: sim- 
plex method, Newton methods, expectation maximization (EM) algorithm, simulated an- 
nealing. It is worth to mention that the last is attempting to find a global maximum while 
other methods are designed to find a local maximum only. For the latter, using different 
starting points helps to find a global maximum. Typically, EM is more stable and robust 
than the standard deterministic methods such as simplex or Newton methods. To assess 
the quality of the fit, there are several popular goodness of fit tests including Kolmogorov- 
Smirnov, Anderson-Darling and chi-square tests. Also, the likelihood ratio test and Akaike's 
information criterion are often used to compare models. Again, detailed descriptions of the 
above mentioned methodologies can be found in many textbooks; for application in OR 
context see e.g. Panjer [5]. 

6.2 Bayesian inference approach 

There is a broad literature covering Bayesian inference and its applications for the insurance 
industry as well as other areas. For a good introduction to the Bayesian inference method, 
see Berger [40]. In our opinion, this approach is well suited for OR, though it is not often 
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used in the OR literature; it was briefly mentioned in books Cruz [3] and Panjer [5] and was 
applied to OR modeling in several recent papers referred to below. To sketch the method, 
consider a random vector of data Y = (Yi,...,l^) whose density, for a given vector of 
parameters = {6i, 62, ■ ■ ■), is i{y\6). In the Bayesian approach, both data and parameters 
are considered to be random. A convenient interpretation is to think that the parameter is 
a rv with some distribution and the true value (which is deterministic but unknown) of the 
parameter is a realization of this rv. Then Bayes' theorem can be formulated as 

h{y,e)=i{y\e)TT{e) = n{e\y)h{y), (26) 

where tt{0) is the density of parameters (a so-called prior density); 7c{6\y) is the density of 
parameters given data Y (a so-called posterior density); h{y,6) is the joint density of the 
data and parameters; i{y\0) is the density of data for given parameters (likelihood); and h{y) 
is a marginal density of Y. If tt{6) is continuous then h{y) = J i{y\0)TT{6)dO and if tc{0) is a 
discrete, then the integration should be replaced with a corresponding summation. Typically, 
7r(0) depends on a set of further parameters, the so-called hyper-parameters, omitted here 
for simplicity of notation. The choice and estimation of the prior will be discussed in Section 
17.21 Using (!26|) . the posterior density can be written as 

7r{e\y)=i{y\e)7r{e)/h{y). (27) 

Here, h{y) plays the role of a normalization constant and the posterior can be viewed as a 
combination of a prior knowledge contained in ^{0) with the data likelihood i{y\6). 

If the data Yi, . . . ,Yn are conditionally (given 6) iid then the posterior can be calculated 
iteratively, i.e. the posterior distribution calculated after k — 1 observations can be treated 
as a prior distribution for the k-th observation. Thus the loss history over many years is 
not required, making the model easier to understand and manage, and allowing experts to 
adjust the priors at every step. 

In practice, it is not unusual to restrict parameters. In this case the posterior distribution 
will be a truncated version of the posterior distribution in the unrestricted case. For example, 
if we identified that 6 is restricted to some range [Ol,Oh] then the posterior will have the 
same type as in the unrestricted case but truncated outside this range. 

Sometimes the posterior density can be calculated in closed form. This is the case for 
the so called conjugate prior distributions where the prior and posterior distributions are of 
the same type, for a precise definition, see e.g. [40, Section 4.2.2, p. 130] 

The mode, mean or median of the posterior 7r(0|y) are often used as point estimators 
for the parameter 6, though in OR context we recommend the use of the whole posterior as 
discussed in Section Popular model selection criteria include the Deviance Information 
Criterion (DIC) and Bayes Information Criterion (BIC); see e.g. Peters and Sisson [41] in 
the OR context. 
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Gaussian approximation. A Gaussian approximation for the posterior 7r(0|y) is obtained 
by a second order Taylor series expansion around the mode 



ln7r(0|y) ^ ln7r(^|y) + \Y1 ^ ^ZbT^ - " ^^^^^^ " ^^8) 




if the prior is continuous at 0. Under this approximation, 7r(0|y) is a multivariate Normal 
distribution with the mean and covariance matrix S = I^^, 



(I) 



\nn{e\y)/d9id9^ 



i\e=e ■ 



In the case of improper constant priors, this approximation compares to the Gaussian ap- 
proximation for the MLEs (l25l) . Also, note that in the case of constant priors, the mode 
of the posterior and MLE are the same. This is also true if the prior is uniform within a 
bounded region, provided that the MLE is within this region. 

Markov chain Monte Carlo methods. In general, estimation (sampling) of the pos- 
terior numerically can be accomplished using Markov chain Monte Carlo (MCMC) meth- 
ods, see e.g. Robert and Casella [42, Sections 6-10] for widely used Metropolis-Hastings 
and Gibbs sampler algorithms. In particular. Random Walk Metropolis-Hastings (RW-MH) 
within Gibbs algorithm is easy to implement and often efficient if the likelihood function 
can be easily evaluated. It is referred to as single- component Metropolis-Hastings in Gilks 
et al. [43, Section 1.4]. The algorithm is not well known among OR practitioners and we 
would like to mention its main features; see e.g. [21] for application in the context of OR 
and Peters et al. [44] for application in the context of a similar problem in the insurance. 
The RW-MH within Gibbs algorithm creates a reversible Markov chain with a stationary 
distribution corresponding to our target posterior distribution. Denote by 0^"^^ the state of 
the chain at iteration m. The algorithm proceeds by proposing to move the ith parameter 
from the current state 6^"^ to a new proposed state 6* sampled from the MCMC proposal 
transition kernel. Then the proposed move is accepted according to some rejection rule de- 
rived from a reversibility condition. Note that, here the parameters are updated one by one 
while in a general Metropolis-Hastings algorithm all parameters are updated simultaneously. 
Typically, the parameters are restricted by simple ranges, 9i G [ai,6i], and proposals are 
sampled from Normal distribution. Then the logical steps of the algorithm are as follows: 



1. Initialize °\ i 



1, . . . , / by e.g. using MLEs. 



2. For m = 1, . . . ,M 



(a) Set = 
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(b) For z = 1, . . . , / 

(c) Sample proposal 6* from the transition kernel, e.g. from the truncated Normal 
density 



where fiy{x\fi, a) and F/v(x|/i, a) are the Normal density and its distribution with 
mean fi and standard deviation a. 



(d) Accept proposal with the acceptence probability 

I »(e<»"|y)/rWI».'"",ff.)J 

where 0* = (^[""^ . . . , O^^l, 6*, 6'-'"~^\ . . .), i.e. simulate U from the uniform (0,1) 
and set 9^"^^ = 9* iiU < p(6^'^\ 0*). Note that, the normalization constant of the 
posterior (1271) does not contribute here. 

(e) Next i 
3. Next m 

This procedure builds a set of correlated samples from the target posterior distribution. One 
of the most useful asymptotic properties is the convergence of ergodic averages constructed 
using the Markov chain samples to the averages obtained under the posterior distribution. 
The chain has to be run until it has sufficiently converged to the stationary distribution 
(posterior distribution) and then one obtains samples from the posterior distribution. The 
RW-MH algorithm is simple in nature and easy to implement. General properties of this 
algorithm can be found in e.g. [42, Section 7.5]. However, for a bad choice of the proposal dis- 
tribution, the algorithm gives a very slow convergence to the stationary distribution. There 
have been several recent studies regarding the optimal scaling of the proposal distributions 
to ensure optimal convergence rates, see e.g. Bedard and Rosenthal [45]. The suggested 
asymptotic acceptance rate optimizing the efficiency of the process is 0.234. Usually it is 
recommended that the cTj in (!29|) are chosen to ensure that the acceptance probability is 
roughly close to 0.234 (this requires some tuning of the cxj prior to final simulations). 



7 Combining different data sources 

Basel II AMA requires (see [1, p. 152]) that: "ylny operational risk measurement system must 
have certain key features to meet the supervisory soundness standard set out in this section. 
These elements must include the use of internal data, relevant external data, scenario anal- 
ysis and factors reflecting the business environment and internal control systems'"^ . 
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Combining these different data sources for model estimation is certainly one of the main 
challenges in OR. As it was emphasized in the interview with several industry's top risk 
executives [46]: "/. . ./ Another big challenge for us is how to mix the internal data with 
eocternal data; this is something that is still a big problem because I don't think anybody has 
a solution for that at the moment" and "What can we do when we don't have enough data 
[. . .] How do I use a small amount of data when I can have external data with scenario 
generation? [. . .] I think it is one of the big challenges for operational risk managers at 
the moment" . 

Often in practice, accounting for factors reflecting the business environment and internal 
control systems is achieved via scaling of data. Then ad-hoc procedures are used to combine 
internal data, external data and expert opinions. For example: 

• Fit the severity distribution to the combined samples of internal and external data and 
fit the frequency distribution using internal data only. 

• Estimate the Poisson annual intensity for the frequency distribution as wXint + (1 — 
w)\cxt, where the intensities Xext and Xint are implied by the external and internal data 
respectively, using expert specified weight w. 

• Estimate the severity distribution as a mixture wiFsa{x)+W2Fi{x)+{1—wi—W2)Fe{x), 
where Fsa{x), Fi{x) and Fe{x) are the distributions identified by scenario analysis, 
internal data and external data respectively, using expert specified weights Wi and W2- 

• Minimum variance principle - the combined estimator is a linear combination of the 
individual estimators obtained from internal data, external data and expert opinion 
separately with the weights chosen to minimise the variance of the combined estimator. 

Probably the easiest to use and flexible procedure is minimum variance principle. The 
rationale behind the principle is as follows. Consider two unbiased independent estimates 
d''^^ and d'^'^^ for parameter 9, i.e. E[6''''^] = 9 and var(^*^'^)) = a^, k = 1, 2. Then the combined 
unbiased linear estimator and its variance are 

Otot = wiO^^^ + W2e^^^ , wi + ^2 = 1; ^gQ^ 
var(^tot) = wlal + (1 - wi) . 
It is easy to find that the weights 

2 2 

minimize var(^tot)- These weights behave as it is expected in practice. In particular, \ 
\i (j\la\ — > {ci\l(j\ is the uncertainty of the estimator Q^^^ over the uncertainty of Q^^^) 
and —> if erf /erf 0. This method can easily be extended to combine three or more 
estimators: 
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etot = w,P^ + ... + WKe^''\ w, + ... + WK = l; (31) 

with 

^^= sr^K^n, 2V ^ = 1'---'^ (32) 

minimizing va.T{6tot)- Heuristically, it can be applied to almost any quantity e.g. distribu- 
tion parameter or distribution characteristic such as mean, variance, etc. The assumption 
that the estimators are unbiased estimators for 6 is probably reasonable when combining 
estimators from different experts (or from expert and internal data). However, it is certainly 
questionable if applied to combine estimators from the external and internal data. Below, 
we focus on the Bayesian inference method that can be used to combine these data sources 
in a consistent statistical framework. 



7.1 Bayesian Inference to combine two data sources 

Bayesian inference is a statistical technique well suited to combine different data sources for 
data analysis; for application in OR context, see Shevchenko and Wiithrich [47]. For the 
closely related methods of credibility theory, see Biihlmann and Gisler [48] and Biihlmann 
et al. [49]. 

As in Section [6l2l consider a random vector of data Y = (Yi, . . . , Yn) whose density, for 
a given vector of parameters = {6i, ... ,6j), is £(y |0). Then the posterior distribution (1271) 
is 



7r{d\y)^i{y\e)ni0). (33) 

Hereafter, oc is used for statements with the relevant terms only. The prior distribution 
7r(0) can be estimated using appropriate expert opinions or using external data. Thus the 
posterior distribution 7r(0|y) combines the prior knowledge (expert opinions or external 
data) with the observed data using formula fl33|) . In practice, we start with the prior 7r(0) 
identified by expert opinions or external data. Then, the posterior 7r(0|y) is calculated using 
fl55]) when actual data are observed. If there is a reason (e.g. a new control policy introduced 
in a bank), then this posterior can be adjusted by an expert and treated as the prior for 
subsequent observations. Examples are presented in [47]. 

As an illustrative example, consider modeling of the annual counts using Poisson distri- 
bution. Suppose that, given A, data N = (A^(l), . . . , A^(T)) are iid from Poisson{X) and 
prior for A is Gamma{a, (3) with a density 

7r(A) = (A//3)"-iexp(-A//3)/(r(«)/?), 

where r(a) is a gamma function. Substituting the prior density and the likelihood of the 

T 

data ^(n|A) = Y\. e"'^A"*-*Vn(t)! into (1331) . it is easy to find that the posterior is Gamma{a, (3) 
t=i 
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with parameters a = a + X]t=i "'(^) ^^"^ ^ = /^/(l + f^T). The expected number of events, 
given past observations, (which is a mean of the posterior in this case) allows for a good 
interpretation as follows: 

E[N{T + 1) |N] = £;[A|N] = a/3 = wiV + (1 - w) Aq, (34) 

where N = ^ Y.t=i ^(^) is the MLE of A using the observed counts only; Aq = a/? is the 
estimate of A using a prior distribution only (e.g. specified by expert or from external data); 
w = T/{T + l/(3) is the credibility weight in [0,1) used to combine Aq and N. As the number 
of years T increases, the credibility weight w increases and vice versa. That is, the more 
observations we have, the greater credibility weight we assign to the estimator based on the 
observed counts, while the lesser weight is attached to the prior estimate. Also, the larger the 
volatility of the prior (larger /3), the greater the credibility weight assigned to observations. 

One of the features of the Bayesian method is that the variance of the posterior 7r{0\y) 
converges to zero for a large number of observations. This means that the true value of 
the risk profile will be known exactly. However, there are many factors (for example, 
political, economical, legal, etc.) changing in time that will not allow precise knowledge of 
the risk profile. One can model this by allowing parameters to be truly stochastic variables 
as discussed in Section [HI Also, the variance of the posterior distribution can be limited 
by some lower levels (e.g. 5%) as has been done in solvency approaches for the insurance 
industry, see e.g. Swiss Solvency Test [50, formulas (25)-(26)]. 

7.2 Estimating priors 

In general, the structural parameters of the prior distributions can be estimated subjectively 
using expert opinions (pure Bayesian approach) or using data (empirical Bayesian approach). 

Pure Bayesian approach. In a pure Bayesian approach, the prior is specified subjectively 
(i.e. using expert opinions). Berger [40, Section 3.2] lists several methods: 

• Histogram approach: split the space of d into intervals and specify the subjective 
probability for each interval. 

• Relative Likelihood Approach: compare the intuitive likelihoods of the different values 
of 0. 

• CDF determinations: subjectively construct the cumulative distribution function for 
the prior and sketch a smooth curve. 

• Matching a Given Functional Form: find the prior distribution parameters assuming 
some functional form for the prior to match beliefs (on the moments, quantiles, etc) 
as close as possible. 
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The use of a particular method is determined by the specific problem and expert experi- 
ence. Usually, if the expected values for the quantiles (or mean) and their uncertainties are 
estimated by the expert then it is possible to fit the priors; also see [47]. 

Empirical Bayesian approach. The prior distribution can be estimated using the marginal 
distribution of the observations. The data can be collective industry data, collective data 
in the bank, etc. For example, consider a specific risk cell in J banks with the data 
Yj = {Yj{l)^ . . . , Yj{Kj))^ j = 1, . . . , J. Here, Kj is the number of observations in bank 
j. Depending on the set up, these could be annual counts or severities or both. Assume that 
Yj{k), k = 1, . . . , Kj are iid from f{.\6j), for given 6j, and are independent between different 
banks; and Oj, j = 1, . . . , J are iid from the prior 7r(.). That is, the risk cell in the j-th bank 
has its own risk profile 6j, but Oi, . . . ,6j are drawn from the same distribution 7r(.). One 
can say that the risk cells in different banks are the same a priori. Then the likelihood of all 
observations can be written as 



Now, the parameters of TT{Oj) can be estimated by maximizing the above likelihood. The 
distribution 7r{dj) is a prior distribution for the cell in the j-th bank. Then, using internal 
data of the risk cell in the j-th bank, the posterior 7r(0j|yj) is calculated using fl33l) . 

It is not difficult to include a priori known differences (e.g. exposure indicators, expert 
opinions on the differences, etc) between the risk cells from the different banks. As an ex- 
ample, consider the case when the annual frequency of the events in the jth bank is modeled 
by a Poisson distribution with a Gamma prior and observations Nj{k), k = 1, . . . ,Kj, j = 
1, . . . , J. Assume that, for given Xj, Nj{l), . . . , Nj{Kj) are independent and Nj{k) is dis- 
tributed from Poisson{XjVj{k)). Here, Vj{k) is a known constant (i.e. the gross income or 
the volume or combination of several exposure indicators) and Xj is the risk profile of the cell 
in the j-th bank. Assuming further that Ai, . . . , Aj are iid from a common prior distribution 
Gammala, (3), the likelihood of all observations can be written similar to fl35l) and parame- 
ters {a, (3) can be estimated using the maximum likelihood or method of moments; see [47]. 
Often it is easier to scale the actual observations that can be incorporated into the model 
set up as follows. Given data Xj{k), k = 1, . . . , Kj, j = 1, . . . , J (these could be frequencies 
or severities), consider variables Yj{k) = Xj{k) /V j{k). Assume that, for given Oj, Yj{k), 
k = 1, . . . ,Kj are iid from f{.\Oj) and Oi, . . . ,6j are iid from the prior vr(.). Then again 
one can construct the likelihood of all data similar to (1351) and fit the parameters of 7r(.) by 
maximizing the likelihood. 

Example. Suppose that the annual frequency N is modeled by a Poisson distribution 




(35) 
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Poisson{X) and the prior 7r(A) for A is Gamma{a, f3). As described above, the prior can be 
estimated using either expert opinions or external data. The expert may specify the "best" 
estimate for the expected number of events i?[ii^[A^|A]] = E[X\ and an uncertainty that the 
"true" A for next year is within the interval [a,b] with the probability Pr[a < A < 6] = p. 
Then the equations -E'[A] = a(3 and p = J^^7r(A)(iA can be solved numerically to estimate the 
structural parameters a and f3. In the insurance industry, the uncertainty for the "true" A is 
often measured in terms of the coefficient of variation, Vco(A) = A/var(A)/E[A]. Given the 
estimates for E[X] = a(3 and Vco(A) = l/i/a? the structural parameters a and (3 are easily 
estimated. For example, if the expert specifies (or external data imply) that E[X\ = 0.5 and 
Pr[0.25 < A < 0.75] =2/3, then we can fit a prior Gamma{a ~ 3.407, jS ~ 0.147). This prior 
is used in Figure [31 presenting the posterior best estimate for the arrival rate calculated using 
( 15^ and referred to as estimator (b), when the annual counts data N{k), k = 1, ... ,15 are 
simulated from Poisson{0.6). Note that, in Figure [3l the prior is considered to be implied 
by external data. On the same Figure we show the standard MLE, Af^^ = 
referred to as estimator (c). For a small number of observed years the Bayesian estimator 
(b) is more accurate as it takes prior information into account. For a large sample size, 
both the MLE and Bayesian estimators converge to the true value 0.6. Also, the Bayesian 
estimator is more stable (smooth) with respect to bad years. The same behavior is observed 
if the experiment is repeated many times with different sequences of random numbers. This 
and other examples can be found in [47]. 

7.3 Combining three data sources 

In Section mi Bayesian inference was used to combine two data sources, i.e. expert opinion 
with internal data, or external data with internal data. An approach to combine all three data 
sources (internal data, expert opinion and external data) can be accomplished as described 
in Lambrigger et al. [51]. Consider data X and expert opinions v on parameter 6. Then the 
posterior is 

7r(0|x, v) oc £i(x|6>)£2(^^|6')7r(0), (36) 

where £i(x|0) is the likelihood of data given 0, ^2{v\0) is the likelihood of expert opinions 
and vr(0) is the prior density estimated using external data. This posterior for 6 combines 
information from internal data, expert opinions and external data. Here it is assumed 
that given 6, expert opinions are independent from internal data. A more general relation 
7r(0|x, v) oc £(x, v\6)ti{6) can be considered to avoid this assumption. 

For illustration purposes, consider modeling of the annual counts: assume that the annual 
counts A^(l), . . . , N{T) are iid from Poisson{X)\ expert opinions vi, . . . , vm on A are iid from 
Gamma{^, X/C,); and the prior on A is Gamma{a, P). Then the posterior is the generalized 
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inverse gamma density 



n{\\n,v) oc n{\)i{n\\)i{v\\) 
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t=i 



T 



M 



a-l + J2n{t)-M^, 



u; = T+l/P, = e5Z 



n, 



(37) 



t=i 



m=l 



In Figure [3l we show the posterior best estimate for the arrival rate i?[A|N, i;] combining 
three data sources (referred to as estimator (a)) and compare it with the estimator i?[A|N] 
combining internal and external data (referred to as estimator (b), also see (21)). The counts 
N{k), k = 1, ... ,15 are simulated from Poisson{0.6); the assumed prior distribution implied 
by external data is the same as considered in the example in Section \72\ i.e. Gamma{a ~ 
3.41, j3 0.15) such that E[\] = 0.5 and Pr[0.25 < A < 0.75] = 2/3; and there is one expert 
opinion v = 0.7 from the distribution with Vco(t;|A) =0.5, i.e ^ = 4. The standard maximum 
likelihood estimate of the arrival rate A^^^^ = ■^XliLi'^l^) referred to as estimator (c). 
Estimator (a), combining all three data sources, certainly outperforms other estimators and 
is more stable around the true value, especially for small data sample size. All estimators 
converge to the true value as the number of observed years increases. The same behavior is 
observed if the experiment is repeated; see detailed discussions in [51]. 

8 Modeling dependence 

Basel II requires (see [1, p. 152]) that: "i?zsA; measures for different operational risk estimates 
must be added for purposes of calculating the regulatory minimum capital requirement. How- 
ever, the bank may be permitted to use internally determined correlations in operational risk 
losses across individual operational risk estimates, provided it can demonstrate to the sat- 
isfaction of the national supervisor that its systems for determining correlations are sound, 
implemented with integrity, and take into account the uncertainty surrounding any such cor- 
relation estimates (particularly in periods of stress). The bank must validate its correlation 
assumptions using appropriate quantitative and qualitative techniques" . Thus if dependence 
is properly quantified between all risk cells j = 1, . . . , J then, under the LDA model ([1]), the 
capital is calculated as 





otherwise the capital should be estimated as 



J 




(39) 
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Adding up VaRs for capital estimation is equivalent to an assumption of perfect positive 
dependence between the annual losses Zj{T + 1), j = 1, . . . , J. In principle, VaR can be 
estimated at any level of granularity and then the capital is calculated as a sum of resulting 
VaRs. Often banks quantify VaR for business lines and add up these estimates to get capital, 
but for simplicity of notations, (!39|) is given at the level of risk cells. It is expected that 
the capital under (138|) is less than (!39|) : 20% diversification is not uncommon. However, it 
is important to note that VaR is not a coherent risk measure, see Artzner et al. [52]. In 
particular, under some circumstances VaR measure may fail a sub-additivity property 



see Embrechts et al. [53] and [54], i.e. dependence modeling could also increase VaR. As can 
be seen from the literature, the dependence between different ORs can be introduced by: 

• Modeling dependence between the annual counts via a copula, as described in Frachot 
et al. [55], Bee [56], Aue and Klakbrener [10]; 

• Using common shock models to introduce events common across different risks and 
leading to the dependence between frequencies studied in Lindskog and McNeil [57] 
and Powojowski et al. [58]. Dependence between severities occurring at the same time 
is considered in Lindskog and McNeil [57]; 

• Modeling dependence between the /cth severities or between /cth event times of different 
risks; see Chavez-Demoulin et al. [8] (e.g. 1'^*, 2"'^, etc losses/event times of the jth 
risk are correlated to the 1"**, 2"'^, etc losses/event times of the ith risk respectively); 

• Modeling dependence between the annual losses of different risks via copulas; see Gi- 
acometti et al. [59], Embrechts and Puccetti [60]; 

• Using the multivariate compound Poisson model based on Levy copulas suggested in 
Bocker and Kliippelberg [61], [62]; 

• Using structural models with common (systematic) factors that can lead to the depen- 
dence between severities and frequencies of different risks and within risk; 

• Modeling dependence between severities and frequencies from different risks and within 
risk using dependence between risk profiles considered in Peters et al. [44]. 

Below, we describe the main concepts involved into these approaches. 



J 




(40) 
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8.1 Copula 

The concept of a copula is a flexible and general technique to model dependence; for an in- 
troduction see e.g. Joe [63] and Nelson [64]; and for application in financial risk management 
see e.g. [6, Section 5]. In brief, a copula is a (i- dimensional multivariate distribution on [0, l]'^ 
with uniform margins. Given a copula function C{.) and univariate marginal distributions 
. . . , Fd{.), the joint distribution with these margins can be constructed as 

Fixi, ...,Xd)) = . . . , Fd{xd)). (41) 

A well known theorem due to Sklar, published in 1959, says that one can always find a unique 
copula C{.) for a joint distribution with given continuous margins. In the case of discrete 
distributions this copula may not be unique. The most commonly used copula (due its simple 
calibration and simulation) is the Gaussian copula, implied by the multivariate Normal 
distribution. It is a distribution of Ui = Fn{Xi), . . . ,Ud = Fn^Xo), where Fn{.) is the 
standard Normal distribution and Xi, . . . , Xd are from the multivariate Normal distribution 
Fs(.) with zero mean, unit variances and correlation matrix S. Formally, in explicit form, 
the Gaussian copula is 

(Ml, ...,Un) = Fj:{F^\u,), . . . , F^\ud)). (42) 

There are many other copulas (e.g. t-copula, Clayton copula, Gumbel copulas to mention a 
few) studied in academic research and used in practice, that can be found in the referenced 
literature. 

8.2 Dependence between frequencies via copula 

The most popular approach in practice is to consider a dependence between the annual counts 
of different risks via a copula. Assuming a J-dimensional copula C{.) and the marginal 
distributions Pj{.) for the annual counts Ni{t), . . . , Nj{t) leads to a model 

N,it) = P,'\U,m • • • , Njit) = Pj\Uj{t)), (43) 

where f/i(t), . . . , Uj{t) are uniform (0,1) rvs from a copula C(.) and -P,~^(-) is the inverse 
marginal distribution of the counts in the jth risk. Here, t is a discrete time (typically in 
annual units but shorter steps might be needed to calibrate the model) and usually the counts 
are assumed to be independent between different t steps. The approach allows us to model 
both positive and negative dependence between counts. As reported in the literature, the 
imphed dependence between annual losses even for a perfect dependence between counts is 
relatively small and as a result the impact on capital is small too. Some theoretical reasons 
for the observation that frequency dependence has only little impact on the operational 
risk capital charge are given in [61]. As an example, in Figure H] we plot Spearman's rank 
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correlation between the annual losses of two risks, Z\ and Z2, induced by the Gaussian 
copula dependence between frequencies. Marginally, the frequencies A^^i and Ni are from the 
Poisson distributions with the intensities A = 5 and A = 10 respectively and the severities 
are from LN{^ = 1, cr = 2) distributions for both risks. 

8.3 Dependence between aggregated losses via copula 

Dependence between the aggregated losses can be introduced similarly to fH51) . In this 
approach, one can model the aggregated losses as 

Z,{t) = F{\U,{t)), • • • , Zj{t) = Fj\Uj{t)), (44) 

where Ui(t), . . . ,Uj(t) are uniform (0,1) rvs from a copula C(.) and -Pj~^(-) is the inverse 
marginal distribution of the aggregated loss of the j-th risk. Note that the marginal dis- 
tribution Fj{.) should be calculated using frequency and severity distributions. Typically, 
the data are available over several years only and a short time step t (e.g. quarterly) is 
needed to calibrate the model. This dependence modeling approach is probably the most 
flexible in terms of the range of achievable dependencies between risks; e.g. perfect positive 
dependence between the annual losses is achievable. However, note that this approach may 
create difficulties with incorporation of insurance into the overall model. This is because an 
insurance policy may apply to several risks with the cover limit applied to the aggregated 
loss recovery; see Section [91 

8.4 Dependence between the kth event times/losses 

Theoretically, one can introduce dependence between the kth severities or between the kth 
event inter-arrival times or between the kth event times of different risks. For example: 1**, 
2"'^, etc losses of thejth risk are correlated to the 1^*, 2"*^, etc losses of the zth risk respectively 
while the severities within each risk are independent. The actual dependence can be done 
via a copula similar to fH51) . for an accurate description we refer to [8]. Here, we would 
like to note that a physical interpretation of such models can be difficult. Also, an example 
of dependence between annual losses induced by dependence between the kth inter-arrival 
times is presented in Figure HI 

8.5 Modeling dependence via Levy copulas 

Bocker and Kliippelberg [61,62] suggested to model dependence in frequency and severity 
between different risks at the same time using a new concept of Levy copulas, see e.g. [65, 
Sections 5.4-5.7]. It is assumed that each risk follows to a univariate compound Poisson 
process (that belongs to a class of Levy processes). Then, the idea is to introduce the 
dependence between risks in such a way that any conjunction of different risks constitutes 
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a univariate compound Poisson process. It is achieved using the multivariate compound 
Poisson processes based on Levy copulas. Note that, if dependence between frequencies or 
annual losses is introduced via copula as in fH51) or flUl) . then the conjunction of risks does 
not follow to a univariate compound Poisson. 

The precise definitions of Levy measure and Levy copula are beyond the purpose of this 
paper and can be found in the above mentioned literature. Here, we would like to mention 
that in the case of a compound Poisson process Levy measure is the expected number of 
losses per unit of time with a loss amount in a pre-specified interval, Tlj{x) = Xj Pr(Xj > x). 
Then the multivariate Levy measure can be constructed from the marginal measures and a 
Levy copula C as 

n(xi, ...,xd) = c'(ni(xi), . . . , tidixd)) (45) 

which is somewhat similar to fj4T|) in a sense that the dependence structure between different 
risks can be separated from the marginal processes. However, it is quite a different concept. 
In particular, a Levy copula for processes with positive jumps is [0, oo)'* [0, oo) mapping 
while a standard copula fllT]) is [0, 1]°' [0, 1] mapping. Also, a Levy copula controls 
dependence between frequencies and dependence between severities (from different risks) at 
the same time. The interpretation of this model is that dependence between different risks is 
due to the loss events occurring at the same time. Important implication of this approach is 
that a total bank's loss can be modeled as a compound Poisson process with some intensity 
and iid severities. If this common severity distribution is sub-exponential then closed-form 
approximation fl23l) can be used to estimate VaR of the total annual loss in a bank. 

8.6 Structural model with common factors 

The use of common (systematic) factors is useful to identify dependent risks and to reduce 
the number of required correlation coefficients that must be estimated, see e.g. [6, Section 
3.4]. Structural models with common factors to model dependence are widely used in credit 
risk, see industry examples in [6, Section 8.3.3]. For OR, these models are qualitatively 
discussed in Marshall [66, Sections 5.3 and 7.4] and there are unpublished examples of a 
practical implementation. As an example, assume a Gaussian copula for the annual counts 
of different risks and consider one common (systematic) factor Q{t) affecting the counts as 
follows: 

Y,it) = p,m + \l^-p]W,{t), j = l,...,J; 

N,{t) = P{'{Fr,{Y,m,---,Nj{t) = Pj'{F^{Yjm- (46) 

Here, W^i(t), . . . , Wj{t) and VL{t) are independent rvs from the standard Normal distribution. 
All rvs are independent between different time steps t. Given VL{t), the counts are indepen- 
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dent but unconditionally the risk profiles are dependent if the corresponding pj are nonzero. 
In this example, one should identify J correlation parameters pj only instead of J{J — l)/2 
parameters of the full correlation matrix. 

Extension of this approach to many factors Jlfe(t), A; = 1, . . . , is easy: 



K 



k=l 



K 



1 -^PjkPjmCOY{Slk{t)^l^{t))Wj(t), (47) 
fe=l 

where Vli{t),. . . ,flK{t) are from a multivariate Normal distribution with zero means, unit 
variances and some correlation matrix. This approach can also be extended to introduce a 
dependence between both severities and frequencies. For example, in the case of one factor, 
one can structure the model as follows: 



Yj{t) = pjQ{t) + ^l-p'^Wj{t), i = l,...,J; 

N,{t) = Pri(Fjv(lS(t))), j = l,...,J; 



Rf\t) = pjQ{t) + ^l-p]v}'\t), s=l,...,Nj{t), j = l,...,J; 
X^/\t) = Fr' (^FN{Rf\t))y s^l,...,Nj{t), j = l,...,J. 

Here: Wj{t), vl'\t), s = 1, . . . , Nj{t), j = 1, . . . , J and n{t) are iid from the standard Nor- 
mal distribution. Again, the logic is that there is a factor affecting severities and frequencies 
within a year such that conditional on this factor, severities and frequencies are indepen- 
dent. The factor is changing stochastically from year to year, so that unconditionally there 
is dependence between frequencies and severities. Also note that in such setup, there is a 
dependence between severities within a risk category. Often, common factors are unobserv- 
able and practitioners use generic intuitive definitions such as: changes in political, legal 
and regulatory environments, economy, technology, system security, system automation, etc. 
Several external and internal factors are typically considered, so that some of the factors 
affect frequencies only (e.g. system automation), some factors affect severities only (e.g. 
changes in legal environment) and some factors affect both the frequencies and severities 
(e.g. system security). 

8.7 Stochastic and dependent risk profiles 

Consider the LDA for risk cells j — 1, . . . , J: 

Z,(t)= (t), i=l,2,... , (48) 

s=l 

where Nj{t) ~ Pj{-\Xt^) and X^/\t) ~ Fj(.|^p'^). Hereafter, notation X ~ F{.) means 
that X is a rv from distribution F{.). It is realistic to consider that the risk profiles A* = 
{Xt^\ . . . , A|'^^) and il)t = {'ipt^\ ■ ■ ■ , i't^^) are not constant but changing in time stochastically 
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due to changing risk factors (e.g. changes business environment, pohtics, regulations, etc). 
Also it is realistic to say that risk factors affect many risk cells and thus the risk profiles are 
dependent. One can model this by assuming some copula C{.) and marginal distributions 
for the risk profiles (also see [67]), i.e. the joint distribution between the risk profiles is 

F (A(t), ^Pit)) = C (Gi(Ai(t)), . . . , Gj(Aj(t)), i/i(V^i(t)), . . . , HjiMt))) , (49) 

where Gj{.) and Hj{.) are the marginal distributions of Xjit) and ipjit) respectively. Depen- 
dence between the risk profiles will induce a dependence between the annual losses. This 
general model can be used to model dependence between the annual counts; between the 
severities of different risks; between the severities within a risk; and between the frequencies 
and severities. The likelihood of data (counts and severities) can be derived but involves a 
multidimensional integral with respect to latent variables (risk profiles). Advanced MCMC 
methods (such as the Slice Sampler method used in [67]) can be used to fit the model. For 
example, consider the bivariate case (J = 2) where: 

• Frequencies Nj{t) ~ Poisson{Xj{t)) and severities Xj'^\t) ~ LN{fij{t),aj(t)); 

• Ai(t) ~ Gamma{2.5,2), X2{t) ~ Gamma{5,2), ^j{t) ~ A^orma/(l, 1), (Jj{t) = 2; 

• The dependence between Xi(t), A2(t), /Ui(i) and fi2{t) is a Gaussian copula. 

Figure [5] shows the induced dependence between the annual losses Zi(t) and Z2(t) vs the 
copula dependence parameter for three cases: if only Ai(t) and A2(t) are dependent; if only 
/ii(t) and fi2(t) are dependent; if the dependence between Ai(t) and A2(t) is the same as 
between /ii(t) and fj.2{t). In all cases the dependence is Gaussian copula. 

8.8 Common shock processes 

Modeling OR events affecting many risk cells can be done using common shock process mod- 
els; see Johnson et al. [68, Section 37]. In particular, consider J risks with the event counts 
Nj{t) = N^'^\t) + Nj{t), where Nj{t), j = 1, . . . , J and N^'^\t) are generated by indepen- 
dent Poisson processes with intensities Xj and Ac respectively. Then Nj{t), j = 1, . . . , J 
are Poisson distributed with intensities Xj = Xj + Ac marginally and are dependent via 
the common events N^'^^t). The linear correlation and covariance between risk counts are 
p(Ni{t), Nj{t)) = Ac/ \/XiXj and cov(A^j(t), Nj(t)) = Ac respectively. Only a positive depen- 
dence between counts can be modeled using this approach. Also, note that the covariance for 
any pair of risks is the same though the correlations are different. More flexible dependence 
can be achieved by allowing a common shock process to contribute to the k-th risk process 
with some probability pk', then cov{Ni{t) , Nj{t)) = XcPiPj- This method can be generalized 
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to many common shock processes; see [57] and [58]. It is also reasonable to consider the de- 
pendence between the severities in different risk cells that occurred due to the same common 
shock event. 



9 Insurance 

Many ORs are insured. If a loss occurred and it is covered by an insurance policy then part 
of the loss will be recovered. A typical policy will provide a recovery R for a loss X subject 
to the excess amount (deductible) D and top cover limit amount U as follows: 



That is the recovery will take place if the loss is larger than the excess and the maximum 
recovery that can be obtained from the policy is U. Note that in fl50l) . the time of the event 
is not involved and the top cover limit applies for a recovery per risk event, i.e. for each 
event the obtained recovery is subject of the top cover limit. Including insurance into the 
LDA is simple; the loss severity in ([T]) should be simply reduced by the amount of recovery 
(!50|) and can be viewed as a simple transformation of the severity. However, there are 
several difficulties in practice. Policies may cover several different risks and different policies 
may cover the same risk. The top cover limit may apply for the aggregated recovery over 
many events of one or several risks (e.g. the policy will pay the recovery for losses until the 
top cover limit is reached by accumulated recovery). These aspects and special insurance 
haircuts required by Basel II [1] make recovery dependent on time. Accurate modeling 
insurance accounting for practical details requires modeling the event times rather than the 
annual counts only, e.g. a Poisson process can be used to model the event times. It is not 
difficult to incorporate the insurance into an overall model if a Monte Carlo method is used 
to quantify the annual loss distributions. 

The Basel II requires that the total capital reduction due to the insurance recoveries is 
capped by 20%. Incorporating insurance into the LDA is not only important for capital 
reduction but also beneficial for negotiating a fair premium with the insurer because the 
distribution of the recoveries and its characteristics can be estimated. 

10 Capital charge via full predictive distribution 

Consider the annual loss in a bank (or the annual loss at a different level depending on 
where the 0.999 quantiles are quantified; see Section [H]) over the next year, Z(T+ 1). Denote 
the density of the annual loss, conditional on parameters 6, as f{z{T + 1)|^). Typically, 




0, if < X < D; 
X-D, if D<X<U + D; 
f/, ifD + U <X. 



(50) 
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given observations, the MLEs are used as the "best fit" point estimators for 0. Then the 
annual loss distribution for the next year is estimated as f{z{T + 1)\6) and its 0.999 quantile, 
(50.999(^)5 is used for the capital charge calculation. 

However, the parameters 6 are unknown and it is important to account for this uncer- 
tainty when capital charge is estimated (especially for risks with small datasets) as discussed 
in [69]. If Bayesian inference is used to quantify the parameters through their posterior 
distribution 7r(0|y), then the full predictive density (accounting for parameter uncertainty) 
of Z{T + 1), given all data Y used in the estimation procedure, is 

f{z{T + l)|y) = j f{z{T + l)\e) X n{e\y)de. (51) 

Here, it is assume that, given parameters 6, Z{T + 1) and Y are independent. If a frequentist 
approach is taken to estimate the parameters, then 6 should be replaced with 6 and the 
integration should be done with respect to the density of parameter estimators 0. The 0.999 
quantile of the full predictive distribution (IHT]) . 

= ^z{t+i)|y(?) = inf : Pt[Z{T + 1) > ^| Y] < 1 - g}, g = 0.999, (52) 

can be used as a risk measure for capital calculations. 

Another approach under a Bayesian framework to account for parameter uncertainty is 
to consider a quantile Qo.gggiO) of the conditional annual loss density f{-\9): 

Qg{0) = F^(V+i)|0(g) = inf{^ : Pr[Z(T + 1) > z\0] <l-q}, q = 0.999. (53) 

Then, given that is distributed as 7r(0|y), one can find the distribution of Qo.999(^) and 
form a predictive interval to contain the true value with some probability. This is similar to 
forming a confidence interval in the frequentist approach using the distribution of Qo.999(^); 
where 6 is treated as random (usually, the Gaussian approximation (12^ is assumed for 0). 
Often, if derivatives can be calculated efficiently, the variance of Qo.999{0) is simply estimated 
via an error propagation method and a first order Taylor expansion). Here, one can use 
deterministic algorithms such as FFT or Panjer recursion to calculate (5o.999(^) efficiently. 
Under this approach, one can argue that the conservative estimate of the capital charge 
accounting for parameter uncertainty should be based on the upper bound of the constructed 
interval. Note that specification of the confidence level is required and it might be difficult 
to argue that the commonly used confidence level 0.95 is good enough for estimation of the 
0.999 quantile. 

In OR, it seems that the objective should be to estimate the full predictive distribution 
( |5T|) for the annual loss Z{T + 1) over next year conditional on all available information and 
then estimate the capital charge as a quantile Qq qqq of this distribution (1521 

Consider a risk cell in the bank. Assume that the frequency p{-\cx.) and severity /(.|/3) 
densities for the cell are chosen. Also, suppose that the posterior distribution n{6\-y), = 
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(q;,/3) is estimated. Then, the full predictive annual loss distribution flSTl) in the cell can be 
calculated using Monte Carlo procedure with the following logical steps: 

1. For k = 1,. . . ,K 

(a) For a given risk simulate the risk parameters 6 = {ct,l3) from the posterior 
7r(0|y). If the posterior is not known in closed form then this simulation can be 
done using MCMC (see Section [6.2p . For example, one can run MCMC for K 
iterations beforehand and simply take the kth iteration parameter values. 

(b) Given = {a, f3), simulate the annual number of events from p{-\a.) and sever- 
ities . . . , X(^) from and calculate the annual loss Z(^) = ^^^^ 

2. Next k 

Obtained annual losses Z^^\ . . . , Z^^'^ are samples from the full predictive density (|5T|) . Ex- 
tending the above procedure to the case of many risks is easy but requires specification of the 
dependence model, see Section [SI In this case, in general, all model parameters (including 
the dependence parameters) should be simulated from their joint posterior in Step 1. Then, 
given these parameters. Step 2 should simulate all risks with a chosen dependence structure. 
In general, sampling from the joint posterior of all model parameters can be accomplished 
via MCMC, see e.g. [67,70]. The 0.999 quantile Qq qqq and other distribution characteristics 
can be estimated using the simulated samples in the usual way, see Section 15. 1[ 

Note that in the above Monte Carlo procedure the risk profiles a and (3 are simulated 
from their posterior distribution for each simulation. Thus, we model both the process risk 
(process uncertainty), which comes from the fact that frequencies and severities are rvs, and 
the parameter risk (parameter uncertainty), which comes from the fact that we do not know 
the true values of = {ex., (3). To calculate the conditional density f{z\0) and its quantile 
Qo.999(^) using parameter point estimators 0, step 1 in the above procedure should be simply 
modified by setting 6 = 6 for all simulations k = 1, . . . , K. Thus, Monte Carlo calculations 
of Qoggg and Qo.999(^) are similar, given that 7i{6\y) is known. If 7r(0|y) is not known in 
closed form then it can be estimated efficiently using Gaussian approximation or available 
MCMC algorithms; see Section O 

The parameter uncertainty is ignored by the estimator Qo.999(^) but is taken into account 
by Qo ggg. Flgurc [6] presents results for the relative bias (averaged over 100 realizations) 
^[<5a999 - Qo.999i0)]/Q^°\ where 6 is MLE, Q(°) is the quantile of /(.|0o) and is the true 
value of the parameter. The frequencies and severities are simulated from Poisson{XQ = 10) 
and LN{hq = l,cro = 2) respectively. Also, constant priors are used for the parameters so 
that there are closed form expressions for the posterior. In this example, the bias induced by 
parameter uncertainty is large: it is approximately 10% after 40 years (i.e. approximately 
400 data points) and converges to zero as the number of losses increases. 
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The parameter values used in the example may not be typical for some ORs. One should 
do the above analysis with real data to find the impact of parameter uncertainty. For 
example a similar analysis for a multivariate case was performed in [70] with real data. For 
high frequency low impact risks, where a large amount of data is available, the impact is 
certainly expected to be small. However for low frequency high impact risks, where the data 
are very limited, the impact can be significant. Also, see Mignola and Ugoccioni [71] for 
discussion of uncertainties involved in OR estimation. 

11 Conclusions 

In this paper we reviewed some methods suggested in the literature for the LDA imple- 
mentation. We emphasized that Bayesian methods can be well suited for modeling OR. 
In particular, Bayesian framework is convenient to combine different data sources (internal 
data, external data and expert opinions) and to account for the relevant uncertainties. Accu- 
rate quantification of the dependences between ORs is a difficult task with many challenges 
to be resolved. There are many aspects of the LDA that may require sophisticated statistical 
methods and different approaches are hotly debated. 
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Basel II business lines (BL) 



Basel II event types (ET) 



• Corporate finance — 0.18) 

• Trading & Sales (/^a = 0.18) 

• Retail banking % = 0.12) 

• Commercial banking (/54 = 0.15) 

• Payment & Settlement (/^s = 0.18) 

• Agency Services (/Sg = 0.15) 

• Asset management {(S-j = 0.12) 

• Retail brokerage {13s — 0.12) 



Table 1: Basel 11 business lines and event types. Pi, ... ,(^8 are the business line factors used 
in the Basel 11 Standardised Approach. 
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• Internal fraud 

• External fraud 

• Employment practices and 
workplace safety 

• Clients, products and business practices 

• Damage to physical assets 

• Business disruption and system failures 

• Execution, delivery and 
process management 
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Figure 1: Relative bias in the 0.999 quantile of the annual loss vs % of truncated points 
for several models ignoring truncation in the case of hght tail severities from LN{S, 1). The 
annual counts above the truncation level are from Poisson{10). 
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Figure 2: Relative bias in the 0.999 quantile of the annual loss vs % of truncated points for 
several models ignoring truncation in the case of heavier tail severities from LN[3,2). The 
annual counts above the truncation level are from Poisson{10). 
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Figure 3: Three estimators for the Poisson arrival rate vs the observation year: (a) Bayesian 
estimator combining internal data, expert opinion i? = 0.7 and external data; (b) Bayesian 
estimator combining internal data and external data; (c) MLE based on internal data 
only. The internal data annual counts (0,0,0,0,1,0,1,1,1,0,2,1,1,2,0) were sampled from the 
Poisson{0.6). The prior implied by external data is Gamma{a, (3) with mean = 0.5. 
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Figure 4: Spearman's rank correlation between the annual losses ps{Zi, Z2) vs the Gaussian 
copula parameter p: (a) - copula between counts A^i and N2; (b) - copula between inter- 
arrival times of two Poisson processes. Marginally, the frequencies are from Poisson{5) and 
Poisson{10) respectively and the severities are iid from LN{1, 2) for both risks. 
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Figure 5: Spearman's rank correlation ps{Zi,Z2) between annual losses vs the Gaussian 
copula parameter p: (a) - copula for the frequency profiles Ai and A2; (b) - copula for the 
severity profiles /ii and 112', (c) - copula for Ai and A2 and the same copula for /ii and 112 
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Figure 6: Relative bias (average over 100 reahzations) in the 0.999 quantile of the annual 
loss induced by the parameter uncertainty vs the number of observation years. Losses were 
simulated from Poisson{10) and LN{1,2). 
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